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Abstract — This paper presents a novel Li-norm semi- 
supervised learning algorithm for robust image analysis by 
giving new Li-norm formulation of Laplacian regularization 
which is the key step of graph-based semi-supervised learning. 
Since our Li-norm Laplacian regularization is defined directly 
over the eigenvectors of the normalized Laplacian matrix, we 
successfully formulate semi-supervised learning as an Li-norm 
linear reconstruction problem which can be effectively solved 
with sparse coding. By working with only a small subset of 
eigenvectors, we further develop a fast sparse coding algorithm 
for our Li-norm semi-supervised learning. Due to the sparsity 
induced by sparse coding, the proposed algorithm can deal with 
the noise in the data to some extent and thus has important 
applications to robust image analysis, such as noise-robust image 
classification and noise reduction for visual and textual bag-of- 
words (BOW) models. In particular, this paper is the first attempt 
to obtain robust image representation by sparse co-refinement of 
visual and textual BOW models. The experimental results have 
shown the promising performance of the proposed algorithm. 

Index Terms — Noise-robust image classification, visual and 
textual BOW refinement, Li-norm semi-supervised learning, Li- 
norm Laplacian regularization 



I. Introduction 

Semi-supervised learning, i.e., learning from both labeled 
and unlabeled data, has been widely applied to many challeng- 
ing image analysis tasks [1|-|6| such as image representation, 
image classification, and image annotation. In different image 
analysis tasks, the manual labeling of training data is often 
tedious, subjective as well as expensive, while the access to 
unlabeled data is much easier. Through exploiting the large 
number of unlabeled data with reasonable assumptions, semi- 
supervised learning [7 1-| 1 1 J can reduce the need for expensive 
labeled data and thus achieve promising results especially for 
community-contributed image collections (e.g. Flickr). 

Among various semi-supervised learning methods, one in- 
fluential work is graph-based semi-supervised learning IS), 
(|9J which models the entire dataset as a graph. The basic 
idea behind this semi-supervised learning is label propagation 
on the graph with the cluster consistency |9| (i.e. two data 
points on the same geometric structure are likely to have 
the same class label). Since the graph is at the heart of 
graph-based semi-supervised learning, graph construction has 
been extensively studied lll2l - llT5l in the past years. However, 
these graph construction methods are not developed directly 
for noise reduction, and the corresponding semi-supervised 
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learning may suffer from significant performance degradation 
due to the inaccurate labeling of data points commonly en- 
countered in different image analysis tasks. For example, the 
annotations of images may be contributed by the community 
(see Flickr) and we can only obtain noisy tags. 

In this paper, we focus on proposing a novel noise-robust 
graph-based semi-supervised learning method, rather than 
the well-studied graph construction. As summarized in |12|, 
the traditional graph-based semi-supervised learning can be 
formulated as a quadratic optimization problem based on 
Laplacian regularization |@], flS], H), ifTTI . |[I6|. Considering 
that the sparsity induced by Li-norm optimization can help to 
deal with the noise in the data to some extent ifTTl . ifTsl . if we 
succeed in formulating Laplacian regularization as an Li-norm 
term instead, we can convert the traditional semi-supervised 
learning to ii-norm optimization and enable our new semi- 
supervised learning also to benefit from the nice property of 
sparsity. Fortunately, derived from the eigenvalue decompo- 
sition of the normalized Laplacian matrix C, we can readily 
represent £ in a symmetrical decomposition form, which can 
be further used to formulate Laplacian regularization as an 
ii-norm term. Since all the eigenvectors of C are explored in 
this symmetrical decomposition, our new Li-norm Laplacian 
regularization can be considered to be explicitly formulated 
based upon the manifold structure of the data. 

As a convex optimization problem, the above Li-norm 
semi-supervised learning has a unique global solution. By 
working only with a small subset of eigenvectors, we de- 
velop a fast sparse coding algorithm for our Li-norm semi- 
supervised learning. In this paper, we only adopt the fast 
iterative shrinkage-thresholding method 1 19| for sparse coding, 
regardless of many other Li-norm optimization methods |20|- 
[23 1 . Due to the nice property of sparsity, the proposed 
algorithm can deal with the noise in the data to some extent, 
as shown in our later experiments. Hence, it has important 
applications to robust image analysis where noisy labels are 
provided. In this paper, we apply the proposed algorithm 
to two typical image analysis tasks, i.e., noise-robust semi- 
supervised image classification and noise reduction for both 
visual and textual bag-of-words (BOW) models. Although only 
tested in these two applications, the proposed algorithm can 
be extended to other image analysis tasks, given that semi- 
supervised learning has been widely used in the literature. 

To emphasize the main contributions of this paper, we 
summarize the following distinct advantages of our novel Li- 
norm semi-supervised learning: 

• We have made the first attempt to formulate Laplacian 
regularization as an Li-norm term explicitly based upon 
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the manifold structure of the data. 

• Our Li-norm semi-supervised learning algorithm has 
been shown to achieve significant improvements in robust 
image analysis where noisy labels are provided. 

• Our new Li-norm Laplacian regularization can be simi- 
larly applied to many other difficult problems, considering 
the wide use of Laplacian regularization. 

• This is the first attempt to obtain robust image represen- 
tation by sparse co-refinement of visual and textual BOW 
models for community-contributed image collections. 

The remainder of this paper is organized as follows. Sec- 
tion provides a brief review of related work. In Section Hill 
we propose a fast Li-norm semi-supervised learning algorithm 
by defining novel Li-norm Laplacian regularization. In Sec- 
tion |IV] the proposed algorithm is applied to two robust image 
analysis tasks: noise-robust image classification and sparse co- 
refinement of visual and textual BOW models. In Section |V] 
we present the experimental results to evaluate the proposed 
algorithm. Finally, Section IVll gives the conclusions. 

II. Related Work 

In this paper, we make attempt to formulate graph-based 
semi-supervised learning as Li-norm optimization so that it 
can benefit from the nice property of sparsity and thus deal 
with the noise in the data to some extent. This is quite 
different from the attempt to construct a graph with sparse 
representation jS), El, lfT4l for graph-based semi-supervised 
learning. Although these two different attempts both exploit 
Li-norm optimization for semi-supervised learning, a new 
Li-norm semi-supervised learning method is proposed in the 
present paper while the traditional semi-supervised learning 
was still used in 13], lfT3l , |fT4l . In fact, the graph constructed 
with sparse representation can be readily applied to our new 
Li-norm semi-supervised learning algorithm. 

To formulate semi-supervised learning as an Li-norm op- 
timization problem, we give new Li-norm explanation of 
Laplacian regularization explicitly based upon the manifold 
structure of the data. This ii-norm Laplacian regularization 
distinguishes our semi-supervised learning algorithm greatly 
from another Li-norm semi-supervised learning algorithm 
proposed in |24| which directly adopts Lasso ||251 for semi- 
supervised learning and completely ignores the important 
Laplacian regularization that has been widely used for graph- 
based semi-supervised learning in the literature j4l, HI, 191 , 
ifTTl . lfT6l . In fact, our ii-norm semi-supervised learning 
algorithm has been shown to outperform |24| significantly (see 
later experimental results). Moreover, although both Laplacian 
regularization and Li-norm optimization have also been used 
in |5 |, |26|, the Laplacian regularization term in the objective 
function is still quadratic, which is quite different from our 
new Li-norm Laplacian regularization. 

Since our new ii-norm Laplacian regularization is defined 
directly over the eigenvectors of the normalized Laplacian 
matrix, we can formulate semi-supervised learning as an Li- 
norm linear reconstruction problem in the framework of sparse 
coding ifTTl , ifTSll . Moreover, by working with only a small 
subset of eigenvectors, we can develop a fast sparse coding 



algorithm for our Li-norm semi-supervised learning, which 
is efficient even for robust image analysis tasks where the 
datasets are often large. Although there exist other Li-norm 
generalizations 12711 . l28l of Laplacian regularization, they are 
not defined based upon the eigenvectors and the corresponding 
sparse coding algorithms incur too large time cost. 

Considering the distinct advantage (i.e. noise robustness as 
shown in later experiments) of our Li-norm semi-supervised 
learning, our original motivation is to apply it to robust image 
analysis where noisy labels are provided. In particular, to our 
best knowledge, we have made the first attempt to obtain 
robust image representation by sparse co-refinement of visual 
and textual BOW models. This strategy is extremely important 
for the success of robust image analysis on community- 
contributed image collections (e.g. Flickr), because it becomes 
rather difficult to generate accurate visual vocabularies and 
obtain clean image tags in such complicated case. However, in 
the literature, most previous methods can not deal with visual 
and textual BOW refinement simultaneously. For example, 
various supervised |29|, [^Ol and unsupervised (3V\, f32l 
methods have been developed specially for visual vocabulary 
optimization, while in Q, l33l . l34l only tag refinement is 
considered for robust image analysis. More detailed compari- 
son to these methods can be found in Section IIV-BI 

III. Li-NORM Semi-Supervised Learning 

In this section, we first give a brief review of graph-based 
semi-supervised learning. To address the problem associated 
with this semi-supervised learning, we further present new Li- 
norm formulation of Laplacian regularization. Finally, based 
on this Li-norm Laplacian regularization, we develop a fast 
Li-norm semi-supervised learning algorithm. 

A. Graph-Based Semi-Supervised Learning 

To introduce graph-based semi-supervised learning, we first 
formulate a semi-supervised learning problem as follows. 
Here, we only consider the two-class problem, while the multi- 
class problem can be handled the same as Given a dataset 
X = {xi, ...jXi, xi+i, ...jXn} and a label set {1, the first 
I data points Xi {i < I) are labeled as yi G {1, —1} and the 
remaining data points x^ {I + I < u < n) are unlabeled 
with Uu — 0. The goal of semi-supervised learning is to 
predict the labels of the unlabeled data points, i.e., to find a 
vector f ~ [/i, corresponding to a classification on the 
dataset X by labeling each data point Xi with a label sign(/i), 
where sign(-) denotes the sign function. Let y = [yi, yn]'^, 
and we can readily observe that y is exactly consistent with 
the initial labels according to the decision rule. 

To solve the above problem by graph-based semi-supervised 
learning, we need to model the whole dataset as a graph Q = 
{V,W} with its vertex set V = A" and weight matrix W — 
[wij]nxn, where Wij denotes the similarity between Xi and 
Xj. The weight matrix W is assumed to be nonnegative and 
symmetric. For example, we usually define W as 

= exp(-||a;i - Xj\\^/{2a^j), (1) 
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Fig. 1 . (a) The two-moons toy dataset with each class having one incorrectly (out of five) labeled data point initially, (b) The classification results on this toy 
dataset by one typical graph-based semi-supervised learning method proposed in 0. We can clearly observe the severe problem of noise diffusion associated 
with the traditional semi-supervised learning when noisy initial labels are provided. 



where the variance ct is a free parameter that can be determined 
empirically. Moreover, to eliminate the need to tune this pa- 
rameter, we can adopt the graph construction methods reported 
in il21 - lll4j . Based on the weight matrix W, we compute the 
normalized Laplacian matrix C of the graph Q by 

C = I - D-iWD-i, (2) 

where / is an nxn identity matrix, and D is an nxn diagonal 
matrix with its i-th diagonal element being equal to the sum 
of the i-th row of W (i.e. J^j ^tj)- 

In this paper, we focus on one typical graph-based semi- 
supervised learning method proposed in |91. Its objective 
function can be defined as follows: 

Q(f) = ^l|f-y|l2 + ^f'^^f, (3) 

where A > is a regularization parameter Then the classifi- 
cation function is given by 

f* = argmin(5(f). (4) 

The first term of Q{i) is the fitting constraint, which means a 
good classification function should not change too much from 
the initial label assignment. The second term is the smoothness 
constraint, which means that a good classification function 
should not change too much between nearby data points. 
The trade-off between these two competitive constraints is 
captured by the positive parameter A. It should be noted that 
the smoothness constraint actually denotes the well-known 
Laplacian regularization |8l, ||9], ifTTl . llT6l which has been 
widely used for semi-supervised learning. 

However, in the literature, the original motivation of de- 
veloping these semi-supervised learning methods is to exploit 
both labeled and unlabeled data, but not to deal with the noise 
in the data. This means that they are not suitable for the chal- 
lenging tasks (e.g. robust image analysis) where noisy initial 
labels are provided. To clearly show this disadvantage, we 
give a toy example in Fig.[T] We can observe that the negative 
effect of noisy labels is severely diffused by the traditional 
semi-supervised learning. Hence, our main motivation is just 
to develop a new semi-supervised learning method that can 
suppress the negative effect of noisy labels. Fortunately, in 
the following, the problem shown in Fig. [T] can be effectively 
handled by ii-norm Laplacian regularization. 



B. Li-Norm Laplacian Regularization 

As reported in ifTTl . ifTSl . the sparsity induced by Li-norm 
optimization can help to deal with the noise in the data to some 
extent. If we succeed in formulating Laplacian regularization 
as an Li-norm term instead, we can convert the traditional 
semi-supervised learning to Li-norm optimization and enable 
our new semi-supervised learning also to benefit from the 
nice property of sparsity (i.e. suppress the negative effect of 
noisy labels). Hence, in the following, we focus on Li-norm 
formulation of Laplacian regularization. 

Considering the important role that the normalized Lapla- 
cian matrix C plays in Laplacian regularization, we first give 
a symmetrical decomposition of C. As a nonnegative definite 
matrix, C can be decomposed into 

C = VT,V'^, (5) 

where y is an n x n orthonormal matrix with each column 
being an eigenvector of C, and E is an n x n diagonal matrix 
with its diagonal element J^u being an eigenvalue of C (sorted 
as < Ell £ •■• < 5]„„). Furthermore, we represent C in the 
following symmetrical decomposition form: 

£ = (E5T/^)^E5F^ = B^B, (6) 

where B — Ti^V"^. Since B is computed with all the 
eigenvectors of C, we can regard B as being expUcitly defined 
based upon the manifold structure of the data. 

We further directly utilize B to define a new Li-norm 
smoothness measure, instead of the traditional smoothness 
measure used as Laplacian regularization for semi-supervised 
learning. In spectral graph theory, the smoothness of a vector 
f e i?" is measured by J7(f) = f^Ci, which is exactly the 
smoothness constraint in equation (O. Different from 
in this paper, the Li-norm smoothness of a vector f G i?" 
is measured by f2(f) = ||i?f||i. As for this new Li-norm 
smoothness measure, we have the following proposition. 

Proposition 1: (i) If fi(f) < 1, 17(f) < 0(f); (ii) For an 
eigenvector V.^ of £, Vt{V.i) = E|; (iii) If f = = 

ELia.^., f^(f) = ELil«dsi 

Proof: (i) If VL{i) < 1, n{f) = f^£f ^ i^B'^Bi = 
{n{{)y < f7(f), where Bi, denotes the i-th row 
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of B; (ii) n{V.i) = llW.lli = \\^^V^V,\\i. 
Since V is orthonormal, we further have (l{V.i) = 

||[0,...,0,Ei0,...,0]^||i = E|; (iii) m - 

II ELi "40, ...,o, si, 0, ...,o]^||i = Er=i l«4d- 

Proposition l(i) shows that our Li-norm smoothness can 
ensure the traditional smoothness if we succeed in reducing 
the former below 1. Proposition l(ii) shows that eigenvectors 
with smaller eigenvalues are smoother in terms of our Li- 
norm smoothness measure. Since any vector f G i?" can be 
denoted as f = Va = o^i^.i, we can conclude from 

Proposition l(iii) that smooth vectors are linear combinations 
of the eigenvectors with small eigenvalues. 

By replacing the traditional smoothness constraint (i.e. 
Laplacian regularization) in equation (O with our Li-norm 
version, we define a new objective function for graph-based 
semi-supervised learning as follows: 

g(f) = i||f-y||2 + A||i?f||i. (7) 

The first term of Q{i) is the fitting constraint, while the 
second term is the Li-norm smoothness constraint used as 
Laplacian regularization. Here, it should be noted that the 
fitting constraint is not formulated as an Li-norm term. The 
reason is that most elements of f tend to zeros (i.e. sparsity) 
by minf | |f — y| 1 1 + A| |i3f 1 1 1 given that y has very few nonzero 
elements (i.e. very few initial labeled data are often provided 
for semi-supervised learning). In other words, the labels of 
data points are almost not propagated across the dataset, which 
completely conflicts with the original goal of semi-supervised 
learning. Hence, the fitting constraint of (5(f) remains as an 
L2-norm term. In the following, Li-norm semi-supervised 
learning (Li-SSL) refers to minf Q(f). 

It is worth noting that our Li-norm formulation of Laplacian 
regularization plays an important role in our explanation of 
ii-norm semi-supervised learning in the framework of sparse 
coding. More concretely, according to Proposition l(iii), our 
ii-norm semi-supervised learning can be formulated as a 
linear reconstruction problem by setting f ~ Va. Furthermore, 
to solve this linear reconstruction problem efficiently, we can 
develop a fast sparse coding algorithm (see Section Illl-Cb by 
working with only a small subset of eigenvectors (i.e. only 
partial columns of V are used), which is especially suitable for 
robust image analysis tasks where the datasets are often large. 
Although there exist other Li-norm generalizations |27|, [28 1 
of Laplacian regularization which approximately take the form 
°f Sij "^ijlfi ~ they are not explicitly defined based upon 
the eigenvectors of C and the strategy of dimension reduction 
is hard to be used for f . Hence, the sparse coding algorithms 
developed in ll27l . IJsl incur too large time cost. 

Finally, we can similarly utilize B to formulate the tradi- 
tional Laplacian regularization as an L2-norm term 

Ci ^ {Bif Bi = \\Bi\\l, (8) 

which is explicitly based upon the manifold structure of 
the data. Accordingly, the objective function Q{i) of the 



traditional semi-supervised learning ID can be redefined as 
the sum of two L2-norm terms: 

Q(f) = ^l|f-y|l2 + ^l|sf|l2- (9) 

In the following, the traditional semi-supervised learning ||9l 
is called as L2-norm semi-supervised learning (_L2-SSL). 

C. Fast Li-Nonn Semi-Supervised Learning 

As a convex optimization problem, our Li-norm semi- 
supervised learning has a unique global solution f* = 
argminf (5(f). Let x = f — y, A = B, and b = —By. The 
original problem minf (5(f) for our Li-norm semi-supervised 
learning is equivalently transformed into: 

mini||x||2 + APx-b||i, (10) 

X I 

which is a new Li-norm optimization problem. Similar to 
ll35l . a log-barrier algorithm can be readily developed for 
this Li-norm optimization. However, the obtained log-barrier 
algorithm scales polynomially with the data size and then 
becomes impractical for image analysis tasks. 

Fortunately, as we have mentioned in Section IIII-BI the 
dimension of our Li-norm semi-supervised learning can be 
reduced dramatically by working only with a small subset of 
eigenvectors of C. That is, similar to ID, ll36l . we significantly 
reduce the dimension of f by requiring it to take the form of 
f = VmCt where Vm is an n x m matrix whose columns are 
the m eigenvectors with smallest eigenvalues (i.e. the first m 
columns of V), which can simultaneously ensure that f is 
as smooth as possible in terms of our Li-norm smoothness. 
According to equation (|7]), the objective function of our Li- 
SSL can now be formulated as follows: 

Q(a) = ^IIC^"m«)-y||2 + A||(S^1/^)(T/„«)||i 

rn 

= -\\V^a-y\\l + \\\Y,^--{V^V.,)a,\\i 

\ m ^ 

= l^\\Vma-y\\l + \Y,ma,\. (11) 

i=i 

The first term of Q{a) denotes the linear reconstruction error, 
while the second term denotes the weighted Li-norm sparsity 
regularization over the reconstruction coefficients. That is, 
our Li-norm semi-supervised learning has successfully been 
transformed into a generalized sparse coding problem. 

The formulation f = V,nOi used in equation (11) has two 
distinct advantages. Firstly, we can derive a linear recon- 
struction problem from the original semi-supervised learning 
problem, and correspondingly we can explain our Li-norm 
semi-supervised learning in the framework of sparse coding. 
This also provides further insight into Laplacian regularization. 
In fact, the second term of Q{a) corresponds to both Laplacian 
regularization and sparsity regularization. By unifying these 
two types of regularization, we thus successfully obtain novel 
Li-norm semi-supervised learning. Secondly, since Q{a) is 
minimized with respect to a e i?™ (to ^ n), we can readily 
develop fast sparse coding algorithms for our Li-norm semi- 
supervised learning. That is, although many sparse coding 
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Fig. 2. (a) The two-moons toy dataset with each class having one incorrectly (out of five) labeled data point initially, (b) The classification results on this 
toy dataset by our fast Li-norm semi-supervised learning algorithm. Different from the traditional semi-supervised leaning, the negative effect of noisy labels 
can be completely suppressed by our new Li-norm semi-supervised learning when noisy initial labels are provided. 



Algorithm 1 Fast Li-SSL Algorithm 

Input: the initial label vector y, the weight matrix W of 

the fc-NN graph, the number of smallest eigenvectors m, 

and the regularization parameter A 

Output: the predicted labels by sign(f*) 

Step 1 . Compute the normalized Laplacian matrix C = I — 

D-^WD-K where D = diag{^^- W,j}. 

Step 2. Find the m smallest eigenvectors of C stored in Vm- 

Step 3. Solve the Li-norm optimization problem a* = 

argmin^ Q{a) using the modified FISTA. 

Step 4. Compute f* = VmCt*. 



algorithms scale polynomially with m, they have linear time 
complexity with respect to the data size n. More importantly, 
we have eliminated the need to compute the full matrix B in 
equation (|7|, which is especially suitable for image analysis on 
large datasets. In fact, we only need to compute the m smallest 
eigenvectors of C To speed up this step, we construct fc-NN 
graphs for our Li-norm semi-supervised learning. Given a fc- 
NN graph (fc <C n), the time complexity of finding m smallest 
eigenvectors of sparse C is 0{'nv' -f m^n + kmn), which is 
scalable with respect to the data size n. 

In theory, any fast sparse coding algorithm can be adopted 
to solve the Li-norm optimization problem nimaQ{a). In 
this paper, we only consider the Fast Iterative Shrinkage- 
Thresholding Algorithm (FISTA) |19|, since its implemen- 
tation mainly involves lightweight operations such as vector 
operations and matrix- vector multiplications. To adjust the 
original FISTA for our Li-norm semi-supervised learning, we 
only need to modify the soft- thresholding function as: 

AS^ AS^ 
soft(a,, " ) = sign(a,,)max{|aj| - 'L '^^^ ^^'^^ 

1 1 m 1 1 s 1 1 m 1 1 s 

where ||T4ji||s represents the spectral norm of the matrix Vm. 
For large problems, it is often computationally expensive to 
directly compute the Lipschitz constant ||Kn||^. In practice, 
it can be efficiently estimated by a backtracking line-search 
strategy |19|. The complete algorithm for our fast Li-norm 
semi-supervised learning is outlined in Algorithm [T] Since 
both Step 2 and Step 3 are scalable with respect to the data 
size n, our algorithm can be applied to large problems. 
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Fig. 3. The quantitative comparison between L2-SSL and Li-SSL on the 
two-moons toy dataset given by Fig. |2(a)| Here, f * denotes the best solution 
found by L2-SSL or Li-SSL when A is set to the same value. The best 
solution found by our Li-SSL is shown to be extremely smooth no matter 
which smoothness measure is considered, which is not true for L2-SSL. 

The classification results by our Li-SSL algorithm on the 
two-moons toy dataset are shown in Fig. |2l We find that our 
algorithm can handle the problem (see Fig. |l(b)[ ) associated 
with the traditional L2-SSL when noisy labels are provided 
initially. That is, our Li-SSL algorithm can benefit from the 
nice property of sparsity induced by Li-norm optimization and 
thus effectively suppress the negative effect of noisy labels. 
To give more convincing verification of such noise-robustness 
advantage, we further show a quantitative comparison between 
i2-SSL and Li-SSL in Fig. [3 The best solution f* found 
by L2-SSL or Li-SSL is evaluated here by five quantitative 
measures such as the fitting error ||f* — y||2, the traditional 
smoothness ||i3f*||2, and the Li-norm smoothness ||i?f*||i. 

We can clearly observe from Fig. [3]that the best solution f * 
found by L2-SSL is not smooth in terms of | |i3f * 1 1 1, although 
smooth in terms of ||i?f*|||. That is, considering the Li- 
norm smoothness measure, we have indirectly shown that L2- 
SSL can be severely misled by noisy labels (also consistent 
with Fig. [1]). Here, it is worth noting that the less smooth a 
solution is, the poorer its generalization ability is (and thus 
more possible to be misled by the noise). On the contrary, the 
best solution f * found by our Li-SSL is shown to be extremely 
smooth no matter which smoothness measure is considered. 
Hence, by simultaneously controlling the fitting error below a 
low level, our Li-SSL has successfully suppressed the negative 
effect of noisy labels (also consistent with Fig. |2|. 
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Fig. 4. The flowchart of sparse co-refinement of visual and textual BOW models by our Li-norm semi-supervised learning only with linear kernel. 



IV. Applications to Robust Image Analysis 

Considering the distinct advantage (i.e. noise robustness) of 
the proposed Li-SSL algorithm, we apply it to two challenging 
tasks of robust image analysis: noise-robust semi-supervised 
image classification and noise reduction for both visual and 
textual BOW models. Although only tested in these two 
applications, the proposed Li-SSL algorithm can be similarly 
extended to other image analysis tasks, since semi-supervised 
learning has been widely used in the literature. 

A. Noise-Robust Semi-Supervised Image Classification 

As the basis of many image analysis tasks such as image 
annotation and retrieval, semi-supervised image classification 
has been extensively studied in the literature HJ-lISl. In these 
applications, the manual labeling of training data is often 
tedious and expensive, while the access to unlabeled data is 
much easier The original motivation of semi-supervised image 
classification is just to reduce the need for expensive labeled 
data by exploiting the large number of unlabeled data. 

In this paper, we consider a more challenging problem, 
i.e., semi-supervised image classification with both correctly 
and incorrectly labeled data. In general, the occurrence of 
noisy labels may be due to the subjective manual labeling 
of training data. Fortunately, this challenging problem can be 
addressed to some extent by our ii-SSL algorithm. As we 
have mentioned, our Li-SSL algorithm can benefit from the 
nice property of sparsity induced by Li-norm optimization 
and thus effectively suppress the negative effect of noisy 
labels. Since we focus on providing convincing verification 
of this noise-robustness advantage, we directly apply our Li- 
SSL algorithm to semi-supervised image classification with 
noisy initial labels, without considering any preprocessing or 
postprocessing techniques. Hence, we only need to extend 
Algorithm 1 to multi-class problems commonly encountered 
in image analysis, which is elaborated in the following. 

We first formulate a multi-class semi-supervised classi- 
fication problem the same as [9|. Given a dataset X = 
{xi, ...jXi, xi+i, a;„} and a label set {1, C} (C is num- 
ber of classes), the first / data points Xi {i < I) ctre labeled 
as: yij = 1 if belongs to class j (1 < j < C) and yij — 
otherwise, while the remaining data points x„ (/ + 1 < w < n) 
are unlabeled with yuj — 0. The goal of semi-supervised 
classification is to predict the labels of the unlabeled data 
points, i.e., to find a matrix F = [fij]nxc corresponding to 
a classification on the dataset X by labeling each data point 



Xi with a label argmaxi<j<(7 fij- Let Y = [yij]nxc^ and we 
can readily observe that Y is exactly consistent with the initial 
labels according to the decision rule. When noisy initial labels 
are provided for semi-supervised classification, some entries 
of Y may be inconsistent with the ground truth. 

Based on the above preliminary notations, we further for- 
mulate our multi-class Li-SSL problem as: 

mmQ(F) = mm-\\F - Y\\l + XllBFlU, (13) 

F F 2 

which can be decomposed into C independent subproblems: 

min Q{F.,) = min ^ 1 - 1 1^ + A| \BF., \ \ (14) 

where Fj and Y,j denote the j-th column of F and Y, 
respectively. Since each subproblem minp , Q{F,j) can be 
regarded as a two-class problem, we can readily solve it 
by Algorithm 1. Let F* = argmini?^. Q{F,j), and we can 
classify Xi into class argmaxi<j<c /j* . 

B. Sparse Co-Refinement of Visual and Textual BOW Models 

We further pay attention to visual and textual BOW re- 
finement to obtain robust image representation, which is 
different from semi-supervised image classification as a high- 
level semantic analysis task. Although both visual and textual 
BOW models have been shown to achieve impressive results, 
each BOW model has its own drawbacks. Firstly, since the 
visual BOW model generally creates a visual vocabulary by 
clustering on the local descriptors extracted from images, the 
visual vocabulary may be far from accurate due to the inherent 
limitation of clustering and thus the labels of local descriptors 
may be rather noisy. This means that visual BOW refinement 
is crucial for the success of BOW-based image analysis tasks. 
Secondly, instead of the expensive manual labeling of images, 
the textual BOW model for image representation is commonly 
based upon the image tags contributed by the community (e.g. 
Flickr) or automatically derived from the associated text (e.g. 
Web page). Because the tags of an image obtained in these 
ways may be incorrect and incomplete, the problem of textual 
BOW refinement becomes rather challenging. 

To address the above problems, we propose a novel frame- 
work for sparse co-refinement of visual and textual BOW mod- 
els by our Li-SSL algorithm, as shown in Fig. |4] Our basic 
idea is to formulate BOW refinement as a multi-class semi- 
supervised learning (SSL) problem by regarding each word of 
the BOW model as a "class" so that our noise-robust Li-SSL 
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algorithm can be applied to noise reduction for both visual 
and textual BOW models. Since textual BOW refinement is 
actually a dual problem of visual BOW refinement, we focus 
on visual BOW refinement in the following. 

Let Y e i?"XJ^^ be the visual BOW representation and B e 
j^nxn computed based on the textual BOW representation, 
where n is the number of images and M is the number of 
visual words. To compute B according to equation (6), we 
only consider a linear kernel matrix (used as the weight matrix 
W) defined with the textual BOW representation. The visual 
BOW refinement problem can be formulated as: 

mini||i^-y||2, + A||BF||i+7||i^-r||i, (15) 

F Z 

where A and 7 denote two regularization parameters. As 
compared to the ii-SSL problem given by equation (13), the 
only difference is that another Li-norm regularization term 
(i.e. ||F — F||i) is considered for visual BOW refinement. 

As we have mentioned in Section ITlI-BI we do not formulate 
the fitting constraint as an Li-norm term for our Li-SSL 
because the predicted labels will be too sparse when very few 
labeled data are provided initially. As a truth, the sparsity of 
predicted labels completely conflicts with the original goal of 
semi-supervised learning. However, the case is quite different 
for visual BOW refinement, i.e., a large number of initial 
labeled data are provided since each visual word can be 
assigned to many images. Hence, the predicted labels may 
not be sparse even if the ii-norm fitting constraint is used 
for semi-supervised learning. Here, our main motivation of 
considering ||F — is to induce the fitting error sparsity 
and thus impose direct noise reduction on Y . 

Although we can find a unique global solution for the visual 
BOW refinement problem by convex optimization, it is not 
easy to develop an efficient algorithm for this convex optimiza- 
tion. Fortunately, we can approximately solve it in two Li- 
norm optimization steps: (1) Y* — argminj? ^\\F — + 
A||Bi^||i; (2) F* = argminF \\\F ~Y*\\], +-f\\F ~ Y\\i. 
The first optimization subproblem can be efficiently solved by 
our Li-SSL algorithm, while the second subproblem has an 
explicit solution based on the soft-thresholding function: 

F* = soft(r* - 7) + Y, (16) 

where the definition of soft(-, •) can be found in equation 
(12). Considering the scalability of our Li-SSL algorithm with 
respect to the data size, the visual BOW refinement problem 
can be solved in a linear time cost. 

As a dual problem, the textual BOW refinement can be 
formulated in the same form of equation (15) by computing 
B using the visual BOW representation instead. In summary, 
we have successfully solve the challenging problem of sparse 
co-refinement of visual and textual BOW models based on our 
Li-SSL algorithm. To our best knowledge, we have made the 
first attempt to obtain robust image representation by sparse 
co-refinement of visual and textual BOW models, which is 
extremely important for the success of robust image analy- 
sis on community-contributed image collections (e.g. Flickr). 
However, in the literature, most previous methods can not 
deal with visual and textual BOW refinement simultaneously. 



TABLE I 

Details of the four image datasets including two handwritten 
digit datasets and two natural image datasets. 



Datasets 


MNIST 


USPS 


Corel 


Scene 


#samples 


10,000 


9,298 


2,000 


2,688 


#features 


784 


256 


400 


400 


#classes 


10 


10 


20 


8 



For example, various supervised ||29l , ||30l and unsupervised 
|[3l1 , |[32l methods have been developed specially for visual 
vocabulary optimization, while in |5|, lf33l , ll34l only tag 
refinement is considered for robust image analysis. 

It is worth noting that the supervisory information is usually 
expensive to obtain for visual vocabulary optimization in \29\, 
f30i, while the access to the image tags used for our visual 
BOW refinement is much easier (although noisy). Moreover, 
the use of image tags also distinguishes our visual BOW 
refinement method from the unsupervised methods |[3T1 , ll32ll 
without considering any high-level semantic information for 
visual vocabulary optimization. As compared to the closely re- 
lated work [51 on tag refinement that only adopts the traditional 
Laplacian regularization for semi-supervised learning, this 
paper has formulated new Li-norm Laplacian regularization 
which has a wide and important use in the literature. 

V. Experimental Results 

In this section, our Li-SSL algorithm is tested in two ap- 
plications: noise-robust image classification and co-refinement 
of visual and textual BOW models. In particular, to show the 
descriptive power of the refined BOW models, we apply them 
to supervised classification with SVM, different from semi- 
supervised classification in the first application. 

A. Noise-Robust Image Classification 

We evaluate our Li-SSL algorithm for noise-robust image 
classification on the four image datasets listed in Table I] We 
first describe the experimental setup and then compare our 
ii-SSL algorithm with other closely related methods. 

1) Experimental Setup: Our Li-norm semi-supervised 
learning (Li-SSL) is compared to four other representative 
methods: (1) the traditional L2-norm semi-supervised learning 
(i2-SSL) Ig), (2) Lasso-based Li-norm semi-supervised learn- 
ing (Lasso-SSL) [24|, (3) linear neighborhood propagation 
(LNP) 1 12], and (4) support vector machine (SVM). To make 
an extensive comparison, we conduct two groups of experi- 
ments: semi-supervised classification with a varying number 
of clean initial labeled images, and noise-robust classification 
with a varying percentage of noisy initial labeled images. The 
test accuracies on the unlabeled images are averaged over 25 
independent runs and used for performance evaluation. 

We adopt two different approaches to kernel matrix com- 
putation. Firstly, for the two handwritten digit datasets (i.e. 
MNIST and USPS), we compute the Gaussian kernel matrix 
according to equation (1) with fixed a — 1. Secondly, for 
the two natural image datasets (i.e. Scene and Corel), we 
compute the spatial Markov kernel matrix |37 | based on 400 
visual words (i.e. 400 features), just the same as ll38l . The 



Fig. 5. Illustration of the effect of different parameters on our Li-SSL algorithm with 50 clean initial labeled images for the two handwritten digit datasets. 
First Row: MNIST. Second Row: USPS. 



MNIST 




Fig. 6. The classification results (%) on the four image datasets by different ; 
The error bar indicates the 95% confidence interval. 

kernel matrix can be directly used for SVM, while for semi- 
supervised learning we can regard it as the weight matrix so 
that a A:-NN graph can be constructed. The k-NN graph is 
further refined for LNP by quadratic programming IfTZj . 

We empirically select k = 4, X — 0.01 and m = 20 for 
our Li-SSL algorithm on the two handwritten digit datasets. 
Here, it should be noted that both k and m are determined by 
the consideration of the tradeoff between running efficiency 
and classification performance, i.e., we always prefer smaller 
k and m for our Li-SSL algorithm when there only exists 
little performance degradation. More importantly, as shown in 
Fig. |5] our Li-SSL algorithm is generally not much sensitive 
to these parameters. The same strategy of parameter selection 
is adopted for our Li-SSL algorithm on the two natural image 
datasets, while the parameters of other related algorithms for 
comparison are also set their respective optimal values. 



USPS 




Igorithms when a varying number of clean labeled images are initially provided. 

2) Classification Results: Although our original motivation 
is to apply our Li-SSL to noise-robust classification, we first 
compare the five different methods in the less challenging 
task of semi-supervised classification with clean initial labeled 
images to verify their effectiveness in dealing with the scarcity 
of labeled images. The comparison results are shown in Fig.|6] 
where the 95% confidence intervals are also provided. In gen- 
eral, we can observe that our Li-SSL consistently performs the 
best among all the five methods. The reason may be that our 
Li-SSL can benefit form the sparsity induced by our Li-norm 
Laplacian regularization and thus suppress the negative effect 
of the complicated manifold structure hidden among images on 
semi-supervise classification. It should be noted that the four 
image datasets have much more complicated structures than 
the two-moons toy dataset shown in Fig. |l(a)[ which are really 
challenging to deal with for the other four methods including 



9 




Fig. 7. The classification results (%) on the four image datasets by different algorithms when a varying percentage of noisy labels are provided (among 
totally 5x#classes initial labeled images). The error bar indicates the 95% confidence interval. 



L2-SSL. Interestingly, although an Li-norm semi-supervised 
learning strategy is also adopted, Lasso-SSL ll24l ignores the 
important Laplacian regularization and thus generally performs 
the worst among the four SSL methods. 

We make further comparison in the challenging task of 
noise-robust classification with noisy initial labeled images. 
Since the four SSL methods have been shown to generally 
outperform SVM (see Fig. |6]l, we focus on verifying the 
effectiveness of noise reduction by semi-supervised learning 
in the following. The comparison results on noise-robust 
classification are shown in Fig. [T] We find that our ii-SSL 
consistently achieves significant gains over the other SSL 
methods, especially when more noisy labels are provided 
initially. That is, our ii-norm Laplacian regularization indeed 
can help to find a smooth and also sparse solution for semi- 
supervised learning and thus effectively suppress the negative 
effect of noisy labels. More importantly, although all the four 
methods suffer from more performance degradation when the 
percentage of noisy labels grows, the performance of ii-SSL 
and Lasso-SSL degrades the slowest due to that they both 
utilize sparse coding for semi-supervised learning. 

Besides the above advantages, our Li-SSL has another 
advantage in terms of running efficiency, i.e., it runs the fastest 
among the four SSL algorithms. For example, the time taken 
by Li-SSL, i2-SSL, Lasso-SSL, and LNP on the MNIST 
dataset with 50 clean labeled images is 39, 57, 433, and 132 
seconds, respectively. We run all the algorithms (Matlab code) 
on a server with 3GHz CPU and 31.9GB RAM. 

B. Visual and Textual BOW Refinement 

In this subsection, our ii-SSL algorithm is applied to sparse 
co-refinement of visual and textual BOW models. To verify the 
descriptive power of the refined BOW models, we focus on 
evaluating them in SVM-based image classification. Here, it 



should be noted that the refined BOW models can also be 
readily extended to many other image analysis tasks such 
as content-based and text-based image retrieval. Moreover, 
although the visual and textual BOW refinement problems 
can be solved by any SSL method (see Fig. lU, we only 
make comparison between our ii-SSL and L2-SSL, since they 
have both been shown to generally outperform the other SSL 
methods in the above experiments. 

1 ) Experimental Setup: We conduct a group of experiments 
on a Flickr benchmark dataset ||39l . which consists of to- 
tally 8,564 images crawled from the photo sharing website 
Flickr. This image dataset is organized into eleven categories: 
airplane, auto, dog, turtle, elephant, NBA, laptop, piano, 
farm, city scape and library. The high-level category labels of 
images can be used for SVM-based image classification. In 
the following experiments, we split this dataset into a training 
set of 4,282 images and a test set of the same size. 

To obtain the visual BOW representation for the Flickr 
dataset, we extract the SIFT descriptors of 16 x 16 pixel blocks 
computed over a regular grid with spacing of 8 pixels. We 
then perform fc-means clustering on the extracted descriptors 
to form a vocabulary of 2,000 visual words. Here, we aim 
to make the visual BOW representation more noisy by con- 
sidering a relatively larger visual vocabulary. Moreover, we 
generate the textual BOW representation based on the user- 
provided textual tags. As a preprocessing step, we remove 
the stop words and check the remaining tags against the 
WordNet to remove the tags that do not exist. The final textual 
vocabulary only contains the most frequent 1,000 words. 

From these two BOW representations, we only derive linear 
kernels for our ii-SSL algorithm in the tasks of visual and 
textual BOW refinement. Based on the obtained kernel matri- 
ces, we can further construct fc-NN graphs for semi-supervised 
learning. The linear kernels are also used for the subsequent 
SVM classification. According to the twofold cross-validation 
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Fig. 8. The twofold cross-validation results by our Li-SSL algorithm on the training set of the Flickr image dataset. First Row: visual BOW refined with 
textual BOW. Second Row: textual BOW refined with visual BOW. 



TABLE II 

The parameters selected by cross-validation for our Li -SSL 

ALGORITHM IN BOTH VISUAL AND TEXTUAL BOW REFINEMENT. 



Parameters 


k 


A 


7 


m 


Visual BOW 


15 


0.010 


0.005 


30 


Textual BOW 


15 


0.005 


0.075 


35 



TABLE III 

The classification results (%) on the Flickr image dataset 
using different bow models. both l2-ssl and li-ssl can be 
used for co-refinement of visual and textual bow models. 



Methods 


Original 


151 


L2-SSL 


Li-SSL 


Visual BOW 


60.1 


78.2 


84.3 


87.4 


Textual BOW 


77.8 


81.4 


81.5 


83.2 



results on the training set as shown in Fig. [8] we set the 
parameters of our Li-SSL algorithm to their respective optimal 
values listed in Table Just as what we have done in 
noise-robust image classification, we still determine both k 
and m by the consideration of the tradeoff between running 
efficiency and classification performance. More importantly, 
we can clearly observe from Fig.[8]that our Li-SSL algorithm 
is not much sensitive to these parameters in most cases. 

2) Refinement Results: To show the effectiveness of visual 
and textual BOW refinement, we compare the refined BOW 
models by our Li-SSL to: (1) the original BOW models, (2) 
the refined BOW models by the SSL method proposed in jS], 
and (3) the refined BOW models by L2-SSL. The comparison 
results are list in Table |III] The immediate observation is that 
the refined BOW models by our Li-SSL lead to obvious gains 
over the original BOW models, especially when the visual 
BOW model is refined with the textual BOW model (i.e. 27.3% 
gain). This means that our ii-SSL for visual and textual BOW 
refinement indeed can benefit from the sparsity induced by Li- 
norm optimization and thus effectively suppress the noise in 
both visual and textual BOW models. 

Moreover, we can clearly observe from Table Hill that L2- 
SSL also achieves promising results in visual and textual 
BOW refinement, although it is not originally developed for 
noise reduction. The reason may be that the visual (or textual) 
words associated with each image in the Flickx dataset are not 
only noisy but also incomplete due to inaccurate clustering 
(or subjective and limited manual labeling), while the issue 



of incomplete words can be effectively handled by word 
propagation based on i2-SSL. Here, it is worth noting that, 
different from the traditional L2-SSL, our Li-SSL is suitable 
for both word propagation and noise reduction. Hence, as 
shown in Table HIH our Li-SSL performs better than L2-SSL 
in both visual and textual BOW refinement. 

As for the SSL method |51, we find that it works nearly 
as well as L2-SSL in textual BOW refinement, but leads to 
much worse results in visual BOW refinement. Its promising 
performance in textual BOW refinement may be due to that 
it can perform both noise reduction and word propagation 
by imposing the fitting error sparsity on SSL. However, the 
case is different for visual BOW refinement, i.e., the issue 
of incomplete words may be more severe for wrong label 
permutation along with inaccurate clustering. As compared 
to L2-SSL [9J (one of the most outstanding SSL methods), 
the SSL method Q has a poorer performance of visual word 
propagation and thus suffers from obvious degradation. 

VI. Conclusions 

We have proposed a novel Li-norm semi-supervised learn- 
ing method in this paper Different from the traditional graph- 
based SSL that defines Laplacian regularization by a quadratic 
function, we have successfully reformulated Laplacian regu- 
larization as an Li-norm term. More importantly, we find that 
this new formulation is explicitly based upon the manifold 
structure of the data. Due to the resulting Li-norm optimiza- 
tion, our new Li-SSL can benefit from the nice property 
of sparsity and thus effectively suppress the negative effect 
of noisy labels. Extensive results have shown the promising 
performance of our Li-SSL in two challenging tasks of robust 
image analysis. In the future work, considering the wide use 
of Laplacian regularization, we will apply our new Li-norm 
Laplacian regularization to many other challenging problems. 
Moreover, the refined visual and textual BOW models by our 
Li-SSL will be evaluated in other image analysis tasks such 
as content-based and text-based image retrieval. 
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